data input
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on comprehensive 3D data inputs, such as point clouds or reconstructed Bird's-Eye View (BEV) maps. In our research, we advance this field by enhancing the capability of MLLMs to understand and reason in 3D spaces directly from video data, without the need for additional 3D input. We propose a novel and efficient method called the Video-3D Geometry Large Language Model (VG LLM). Our approach utilizes a 3D visual geometry encoder to extract 3D prior information from video sequences. This information is then integrated with visual tokens and input into the MLLM. Extensive experiments have shown that our method has achieved substantial improvements in various tasks related to 3D scene understanding and spatial reasoning, all directly learned from video sources. Impressively, our 4B model, which does not rely on explicit 3D data inputs, achieves competitive results compared to existing state-of-the-art methods, and even surpasses the Gemini-1.5-Pro in the VSI-Bench evaluations.
T-KAER: Transparency-enhanced Knowledge-Augmented Entity Resolution Framework
Li, Lan, Fang, Liri, Liu, Yiren, Torvik, Vetle I., Ludaescher, Bertram
Entity resolution (ER) is the process of determining whether two representations refer to the same real-world entity and plays a crucial role in data curation and data cleaning. Recent studies have introduced the KAER framework, aiming to improve pre-trained language models by augmenting external knowledge. However, identifying and documenting the external knowledge that is being augmented and understanding its contribution to the model's predictions have received little to no attention in the research community. This paper addresses this gap by introducing T-KAER, the Transparency-enhanced Knowledge-Augmented Entity Resolution framework. To enhance transparency, three Transparency-related Questions (T-Qs) have been proposed: T-Q(1): What is the experimental process for matching results based on data inputs? T-Q(2): Which semantic information does KAER augment in the raw data inputs? T-Q(3): Which semantic information of the augmented data inputs influences the predictions? To address the T-Qs, T-KAER is designed to improve transparency by documenting the entity resolution processes in log files. In experiments, a citation dataset is used to demonstrate the transparency components of T-KAER. This demonstration showcases how T-KAER facilitates error analysis from both quantitative and qualitative perspectives, providing evidence on "what" semantic information is augmented and "why" the augmented knowledge influences predictions differently.
Unlocking Insights: Semantic Search in Jupyter Notebooks
Semantic search, a process aimed at delivering highly relevant search results by comprehending the searcher's intent and the contextual meaning of terms within a searchable dataspace, plays a pivotal role in information retrieval. In this paper, we investigate the application of large language models to enhance semantic search capabilities, specifically tailored for the domain of Jupyter Notebooks. Our objective is to retrieve generated outputs, such as figures or tables, associated functions and methods, and other pertinent information. We demonstrate a semantic search framework that achieves a comprehensive semantic understanding of the entire notebook's contents, enabling it to effectively handle various types of user queries. Key components of this framework include: 1). A data preprocessor is designed to handle diverse types of cells within Jupyter Notebooks, encompassing both markdown and code cells. 2). An innovative methodology is devised to address token size limitations that arise with code-type cells. We implement a finer-grained approach to data input, transitioning from the cell level to the function level, effectively resolving these issues.
GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption
Frimpong, Eugene, Nguyen, Khoa, Budzys, Mindaugas, Khan, Tanveer, Michalas, Antonis
Machine Learning (ML) has emerged as one of data science's most transformative and influential domains. However, the widespread adoption of ML introduces privacy-related concerns owing to the increasing number of malicious attacks targeting ML models. To address these concerns, Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of ML models. One such approach is the use of Homomorphic Encryption (HE). However, the significant drawbacks and inefficiencies of traditional HE render it impractical for highly scalable scenarios. Fortunately, a modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE), has recently emerged, combining the strengths of symmetric cryptography and HE to surmount these challenges. Our work seeks to introduce HHE to ML by designing a PPML scheme tailored for end devices. We leverage HHE as the fundamental building block to enable secure learning of classification outcomes over encrypted data, all while preserving the privacy of the input data and ML model. We demonstrate the real-world applicability of our construction by developing and evaluating an HHE-based PPML application for classifying heart disease based on sensitive ECG data. Notably, our evaluations revealed a slight reduction in accuracy compared to inference on plaintext data. Additionally, both the analyst and end devices experience minimal communication and computation costs, underscoring the practical viability of our approach. The successful integration of HHE into PPML provides a glimpse into a more secure and privacy-conscious future for machine learning on relatively constrained end devices.
Coincident Learning for Unsupervised Anomaly Detection
Humble, Ryan, Zhang, Zhe, O'Shea, Finn, Darve, Eric, Ratner, Daniel
Anomaly detection is an important task for complex systems (e.g., industrial facilities, manufacturing, large-scale science experiments), where failures in a sub-system can lead to low yield, faulty products, or even damage to components. While complex systems often have a wealth of data, labeled anomalies are typically rare (or even nonexistent) and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called CoAD, which is specifically designed for multi-modal tasks and identifies anomalies based on \textit{coincident} behavior across two different slices of the feature space. We define an \textit{unsupervised} metric, $\hat{F}_\beta$, out of analogy to the supervised classification $F_\beta$ statistic. CoAD uses $\hat{F}_\beta$ to train an anomaly detection algorithm on \textit{unlabeled data}, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and a data set from a particle accelerator.
Framework for developing quantitative agent based models based on qualitative expert knowledge: an organised crime use-case
Oetker, Frederike, Nespeca, Vittorio, Vis, Thijs, Duijn, Paul, Sloot, Peter, Quax, Rick
In order to model criminal networks for law enforcement purposes, a limited supply of data needs to be translated into validated agent-based models. What is missing in current criminological modelling is a systematic and transparent framework for modelers and domain experts that establishes a modelling procedure for computational criminal modelling that includes translating qualitative data into quantitative rules. For this, we propose FREIDA (Framework for Expert-Informed Data-driven Agent-based models). Throughout the paper, the criminal cocaine replacement model (CCRM) will be used as an example case to demonstrate the FREIDA methodology. For the CCRM, a criminal cocaine network in the Netherlands is being modelled where the kingpin node is being removed, the goal being for the remaining agents to reorganize after the disruption and return the network into a stable state. Qualitative data sources such as case files, literature and interviews are translated into empirical laws, and combined with the quantitative sources such as databases form the three dimensions (environment, agents, behaviour) of a networked ABM. Four case files are being modelled and scored both for training as well as for validation scores to transition to the computational model and application phase respectively. In the last phase, iterative sensitivity analysis, uncertainty quantification and scenario testing eventually lead to a robust model that can help law enforcement plan their intervention strategies. Results indicate the need for flexible parameters as well as additional case file simulations to be performed.
Data Drift vs. Concept Drift: What Is the Difference? - DATAVERSITY
Model drift refers to the phenomenon that occurs when the performance of a machine learning model degrades with time. This happens for various reasons, including data distribution changes, changes in the goals or objectives of the model, or changes to the environment in which the model is operating. There are two main types of model drift that can occur: data drift and concept drift. Data drift refers to the changing distribution of the data to which the model is applied. Concept drift refers to a changing underlying goal or objective for the model.
AdTheorent Is Using Machine Learning To Predict Effective Inventory
Signal loss calls for the use of, well, other signals. "The biggest trend for us right now is finding ways to be less reliant on cookie data," said John Kirk, media director in charge of digital investment at 22Squared, an Atlanta-based media agency whose clients include Baskin-Robbins, Publix and Southeast Toyota. One alternative approach, Kirk said, is to "home in on audiences where we do have the data." In that vein, 22Squared has been testing a solution released by AdTheorent on Wednesday that uses machine learning to score programmatic inventory based on the probability that an impression will lead to a desired outcome. Southeast Toyota is also a launch partner for the product.
Top Learning Trends to Expect in 2023 - Beyond The Sky
What learning trends are in store for 2023? The coming year will bear witness to some significant changes across the entire business world. Recession poses an ever-constant threat so it's important that businesses equip themselves for the future. On top of it all, businesses are having trouble finding new recruits due to a labor shortage. So 2023 will force businesses to strengthen their current ranks to protect against a shallow recruitment pool.
AI system reconstructs words from brain data
Researchers demonstrate an AI system that can reconstruct semantic content in the form of text from fMRI data. A brain-computer interface that reconstructs language would have numerous applications in science, medicine, and industry. Invasive methods using recordings from surgically implanted electrodes show that it is possible to reconstruct language for simple brain control. But these interventions remain dangerous, even though companies like Elon Musk's Neuralink are working on methods to make such interventions as harmless as possible and without consequential damage. Non-invasive language decoders, however, could become commonplace and help people in the future to control technical devices by thought, for example.